Policy Iteration Based on Stochastic Factorization

نویسندگان

André da Motta Salles Barreto

Joelle Pineau

Doina Precup

چکیده

When a transition probability matrix is represented as the product of two stochastic matrices, one can swap the factors of the multiplication to obtain another transition matrix that retains some fundamental characteristics of the original. Since the derived matrix can be much smaller than its precursor, this property can be exploited to create a compact version of a Markov decision process (MDP), and hence to reduce the computational cost of dynamic programming. Building on this idea, this paper presents an approximate policy iteration algorithm called policy iteration based on stochastic factorization, or PISF for short. In terms of computational complexity, PISF replaces standard policy iteration’s cubic dependence on the size of the MDP with a function that grows only linearly with the number of states in the model. The proposed algorithm also enjoys nice theoretical properties: it always terminates after a finite number of iterations and returns a decision policy whose performance only depends on the quality of the stochastic factorization. In particular, if the approximation error in the factorization is sufficiently small, PISF computes the optimal value function of the MDP. The paper also discusses practical ways of factoring an MDP and illustrates the usefulness of the proposed algorithm with an application involving a large-scale decision problem of real economical interest.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning using Kernel-Based Stochastic Factorization

Kernel-based reinforcement-learning (KBRL) is a method for learning a decision policy from a set of sample transitions which stands out for its strong theoretical guarantees. However, the size of the approximator grows with the number of transitions, which makes the approach impractical for large problems. In this paper we introduce a novel algorithm to improve the scalability of KBRL. We resor...

متن کامل

Policy iteration based feedback control

It is well known that stochastic control systems can be viewed as Markov decision processes (MDPs) with continuous state spaces. In this paper, we propose to apply the policy iteration approach in MDPs to the optimal control problem of stochastic systems. We first provide an optimality equation based on performance potentials and develop a policy iteration procedure. Then we apply policy iterat...

متن کامل

Tree-Based On-Line Reinforcement Learning

Fitted Q-iteration (FQI) stands out among reinforcement learning algorithms for its flexibility and ease of use. FQI can be combined with any regression method, and this choice determines the algorithm’s statistical and computational properties. The combination of FQI with an ensemble of regression trees gives rise to an algorithm, FQIT, that is computationally efficient, scalable to high dimen...

متن کامل

Stochastic Assessment of Voltage Sags in Distribution Networks

This paper compares fault position and Monte Carlo methods as the most common methods in stochastic assessment of voltage sags. To compare their abilities, symmetrical and unsymmetrical faults with different probability distribution of fault positions along the lines are applied in a test system. The voltage sag magnitude in different nodes of test system is calculated. The problem with the...

متن کامل

Approximate Policy Iteration for several Environments and Reinforcement Functions

We state an approximate policy iteration algorithm to find stochastic policies that optimize single-agent behavior for several environments and reinforcement functions simultaneously. After introducing a geometric interpretation of policy improvement for stochastic policies we discuss approximate policy iteration and evaluation. We present examples for two blockworld environments and reinforcem...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

J. Artif. Intell. Res.

دوره 50 شماره

صفحات -

تاریخ انتشار 2014

Policy Iteration Based on Stochastic Factorization

نویسندگان

چکیده

منابع مشابه

Reinforcement Learning using Kernel-Based Stochastic Factorization

Policy iteration based feedback control

Tree-Based On-Line Reinforcement Learning

Stochastic Assessment of Voltage Sags in Distribution Networks

Approximate Policy Iteration for several Environments and Reinforcement Functions

عنوان ژورنال:

اشتراک گذاری